Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Neural Netw ; 171: 159-170, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38091760

RESUMO

Nuclei detection is one of the most fundamental and challenging problems in histopathological image analysis, which can localize nuclei to provide effective computer-aided cancer diagnosis, treatment decision, and prognosis. The fully-supervised nuclei detector requires a large number of nuclei annotations on high-resolution digital images, which is time-consuming and needs human annotators with professional knowledge. In recent years, weakly-supervised learning has attracted significant attention in reducing the labeling burden. However, detecting dense nuclei of complex crowded distribution and diverse appearances remains a challenge. To solve this problem, we propose a novel point-supervised dense nuclei detection framework that introduces position-based anchor optimization to complete morphology-based pseudo-label supervision. Specifically, we first generate cellular-level pseudo labels (CPL) for the detection head via a morphology-based mechanism, which can help to build a baseline point-supervised detection network. Then, considering the crowded distribution of the dense nuclei, we propose a mechanism called Position-based Anchor-quality Estimation (PAE), which utilizes the positional deviation between an anchor and its corresponding point label to suppress low-quality detections far from each nucleus. Finally, to better handle the diverse appearances of nuclei, an Adaptive Anchor Selector (AAS) operation is proposed to automatically select positive and negative anchors according to morphological and positional statistical characteristics of nuclei. We conduct comprehensive experiments on two widely used benchmarks, MO and Lizard, using ResNet50 and PVTv2 as backbones. The results demonstrate that the proposed approach has superior capacity compared with other state-of-the-art methods. In particularly, in dense nuclei scenarios, our method can achieve 95.1% performance of the fully-supervised approach. The code is available at https://github.com/NucleiDet/DenseNucleiDet.


Assuntos
Benchmarking , Diagnóstico por Computador , Humanos , Processamento de Imagem Assistida por Computador , Conhecimento , Aprendizado de Máquina Supervisionado
2.
IEEE Trans Cybern ; PP2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37943655

RESUMO

Salient instance segmentation (SIS) is an emerging field that evolves from salient object detection (SOD), aiming at identifying individual salient instances using segmentation maps. Inspired by the success of dynamic convolutions in segmentation tasks, this article introduces a keypoints-based SIS network (KepSalinst). It employs multiple keypoints, that is, the center and several peripheral points of an instance, as effective geometrical guidance for dynamic convolutions. The features at peripheral points can help roughly delineate the spatial extent of the instance and complement the information inside the central features. To fully exploit the complementary components within these features, we design a differentiated patterns fusion (DPF) module. This ensures that the resulting dynamic convolutional filters formed by these features are sufficiently comprehensive for precise segmentation. Furthermore, we introduce a high-level semantic guided saliency (HSGS) module. This module enhances the perception of saliency by predicting a map for the input image to estimate a saliency score for each segmented instance. On four SIS datasets (ILSO, SOC, SIS10K, and COME15K), our KepSalinst outperforms all previous models qualitatively and quantitatively.

3.
IEEE Trans Image Process ; 32: 4472-4485, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37335801

RESUMO

Due to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwater image enhancement (UIE) has also emerged as the times require. Among existing UIE methods, Generative Adversarial Networks (GANs) based methods perform well in visual aesthetics, while the physical model-based methods have better scene adaptability. Inheriting the advantages of the above two types of models, we propose a physical model-guided GAN model for UIE in this paper, referred to as PUGAN. The entire network is under the GAN architecture. On the one hand, we design a Parameters Estimation subnetwork (Par-subnet) to learn the parameters for physical model inversion, and use the generated color enhancement image as auxiliary information for the Two-Stream Interaction Enhancement sub-network (TSIE-subnet). Meanwhile, we design a Degradation Quantization (DQ) module in TSIE-subnet to quantize scene degradation, thereby achieving reinforcing enhancement of key regions. On the other hand, we design the Dual-Discriminators for the style-content adversarial constraint, promoting the authenticity and visual aesthetics of the results. Extensive experiments on three benchmark datasets demonstrate that our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics. The code and results can be found from the link of https://rmcong.github.io/proj_PUGAN.html.

4.
IEEE Trans Image Process ; 32: 3027-3039, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37192028

RESUMO

In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization. In addition, a transformer shows an exponential increase in the inference, training, and debugging times. Considering a wave function representation, we propose the WaveNet architecture that adopts a novel vision task-oriented wavelet-based MLP for feature extraction to perform salient object detection in RGB (red-green-blue)-thermal infrared images. In addition, we apply knowledge distillation to a transformer as an advanced teacher network to acquire rich semantic and geometric information and guide WaveNet learning with this information. Following the shortest-path concept, we adopt the Kullback-Leibler distance as a regularization term for the RGB features to be as similar to the thermal infrared features as possible. The discrete wavelet transform allows for the examination of frequency-domain features in a local time domain and time-domain features in a local frequency domain. We apply this representation ability to perform cross-modality feature fusion. Specifically, we introduce a progressively cascaded sine-cosine module for cross-layer feature fusion and use low-level features to obtain clear boundaries of salient objects through the MLP. Results from extensive experiments indicate that the proposed WaveNet achieves impressive performance on benchmark RGB-thermal infrared datasets. The results and code are publicly available at https://github.com/nowander/WaveNet.

5.
Artigo em Inglês | MEDLINE | ID: mdl-37018573

RESUMO

Salient object detection (SOD) aims to determine the most visually attractive objects in an image. With the development of virtual reality (VR) technology, 360 ° omnidirectional image has been widely used, but the SOD task in 360 ° omnidirectional image is seldom studied due to its severe distortions and complex scenes. In this article, we propose a multi-projection fusion and refinement network (MPFR-Net) to detect the salient objects in 360 ° omnidirectional image. Different from the existing methods, the equirectangular projection (EP) image and four corresponding cube-unfolding (CU) images are embedded into the network simultaneously as inputs, where the CU images not only provide supplementary information for EP image but also ensure the object integrity of cube-map projection. In order to make full use of these two projection modes, a dynamic weighting fusion (DWF) module is designed to adaptively integrate the features of different projections in a complementary and dynamic manner from the perspective of inter and intrafeatures. Furthermore, in order to fully explore the way of interaction between encoder and decoder features, a filtration and refinement (FR) module is designed to suppress the redundant information of the feature itself and between the features. Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively. The code and results can be found from the link of https://rmcong.github.io/proj_MPFRNet.html.

6.
IEEE Trans Cybern ; 53(3): 1920-1931, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35867373

RESUMO

The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract interimage correspondence is crucial for the CoSOD task. In this article, we propose a global-and-local collaborative learning (GLNet) architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture the comprehensive interimage corresponding relationship among different images from the global and local perspectives. First, we treat different images as different time slices and use 3-D convolution to integrate all intrafeatures intuitively, which can more fully extract the global group semantics. Second, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local interimage relationship. Third, the interimage relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive interimage collaboration cues. Finally, the intra and inter features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms 11 state-of-the-art competitors trained on some large datasets (about 8k-200k images).

7.
IEEE Trans Cybern ; 53(1): 539-552, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35417369

RESUMO

Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.

8.
IEEE Trans Image Process ; 31: 6800-6815, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36288228

RESUMO

Focusing on the issue of how to effectively capture and utilize cross-modality information in RGB-D salient object detection (SOD) task, we present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement. For the cross-modality interaction, 1) a progressive attention guided integration unit is proposed to sufficiently integrate RGB-D feature representations in the encoder stage, and 2) a convergence aggregation structure is proposed, which flows the RGB and depth decoding features into the corresponding RGB-D decoding streams via an importance gated fusion unit in the decoder stage. For the cross-modality refinement, we insert a refinement middleware structure between the encoder and the decoder, in which the RGB, depth, and RGB-D encoder features are further refined by successively using a self-modality attention refinement unit and a cross-modality weighting refinement unit. At last, with the gradually refined features, we predict the saliency map in the decoder stage. Extensive experiments on six popular RGB-D SOD benchmarks demonstrate that our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively. The code and results can be found from the link of https://rmcong.github.io/proj_CIRNet.html.

9.
IEEE J Biomed Health Inform ; 26(8): 4090-4099, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35536816

RESUMO

Clinically, proper polyp localization in endoscopy images plays a vital role in the follow-up treatment (e.g., surgical planning). Deep convolutional neural networks (CNNs) provide a favoured prospect for automatic polyp segmentation and evade the limitations of visual inspection, e.g., subjectivity and overwork. However, most existing CNNs-based methods often provide unsatisfactory segmentation performance. In this paper, we propose a novel boundary constraint network, namely BCNet, for accurate polyp segmentation. The success of BCNet benefits from integrating cross-level context information and leveraging edge information. Specifically, to avoid the drawbacks caused by simple feature addition or concentration, BCNet applies a cross-layer feature integration strategy (CFIS) in fusing the features of the top-three highest layers, yielding a better performance. CFIS consists of three attention-driven cross-layer feature interaction modules (ACFIMs) and two global feature integration modules (GFIMs). ACFIM adaptively fuses the context information of the top-three highest layers via the self-attention mechanism instead of direct addition or concentration. GFIM integrates the fused information across layers with the guidance from global attention. To obtain accurate boundaries, BCNet introduces a bilateral boundary extraction module that explores the polyp and non-polyp information of the shallow layer collaboratively based on the high-level location information and boundary supervision. Through joint supervision of the polyp area and boundary, BCNet is able to get more accurate polyp masks. Experimental results on three public datasets show that the proposed BCNet outperforms seven state-of-the-art competing methods in terms of both effectiveness and generalization.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos
10.
IEEE Trans Image Process ; 30: 9179-9192, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34739374

RESUMO

RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.


Assuntos
Algoritmos , Semântica
11.
IEEE Trans Image Process ; 30: 4985-5000, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33961554

RESUMO

Underwater images suffer from color casts and low contrast due to wavelength- and distance-dependent attenuation and scattering. To solve these two degradation issues, we present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor. Concretely, we first propose a multi-color space encoder network, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure. Coupled with an attention mechanism, the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted. Inspired by underwater imaging physical models, we design a medium transmission (indicating the percentage of the scene radiance reaching the camera)-guided decoder network to enhance the response of network towards quality-degraded regions. As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods. Extensive experiments demonstrate that our Ucolor achieves superior performance against state-of-the-art methods in terms of both visual quality and quantitative metrics. The code is publicly available at: https://li-chongyi.github.io/Proj_Ucolor.html.

12.
IEEE Trans Cybern ; 51(1): 88-100, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32078571

RESUMO

Salient object detection from RGB-D images is an important yet challenging vision task, which aims at detecting the most distinctive objects in a scene by combining color information and depth constraints. Unlike prior fusion manners, we propose an attention steered interweave fusion network (ASIF-Net) to detect salient objects, which progressively integrates cross-modal and cross-level complementarity from the RGB image and corresponding depth map via steering of an attention mechanism. Specifically, the complementary features from RGB-D images are jointly extracted and hierarchically fused in a dense and interweaved manner. Such a manner breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Meanwhile, an attention mechanism is introduced to locate the potential salient regions in an attention-weighted fashion, which advances in highlighting the salient objects and suppressing the cluttered background regions. Instead of focusing only on pixelwise saliency, we also ensure that the detected salient objects have the objectness characteristics (e.g., complete structure and sharp boundary) by incorporating the adversarial learning that provides a global semantic constraint for RGB-D salient object detection. Quantitative and qualitative experiments demonstrate that the proposed method performs favorably against 17 state-of-the-art saliency detectors on four publicly available RGB-D salient object detection datasets. The code and results of our method are available at https://github.com/Li-Chongyi/ASIF-Net.

13.
IEEE Trans Image Process ; 30: 7012-7024, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33141667

RESUMO

There are two main issues in RGB-D salient object detection: (1) how to effectively integrate the complementarity from the cross-modal RGB-D data; (2) how to prevent the contamination effect from the unreliable depth map. In fact, these two problems are linked and intertwined, but the previous methods tend to focus only on the first problem and ignore the consideration of depth map quality, which may yield the model fall into the sub-optimal state. In this paper, we address these two issues in a holistic model synergistically, and propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity. By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner, and guide the fusion process of two modal data to prevent the contamination occurred. The gated multi-modality attention module in the fusion process exploits the attention mechanism with a gate controller to capture long-range dependencies from a cross-modal perspective. Experimental results compared with 16 state-of-the-art methods on 8 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively. https://github.com/JosephChenHub/DPANet.

14.
IEEE Trans Image Process ; 30: 1305-1317, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33306467

RESUMO

Despite the remarkable advances in visual saliency analysis for natural scene images (NSIs), salient object detection (SOD) for optical remote sensing images (RSIs) still remains an open and challenging problem. In this paper, we propose an end-to-end Dense Attention Fluid Network (DAFNet) for SOD in optical RSIs. A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships, and is further embedded in a Dense Attention Fluid (DAF) structure that enables shallow attention cues flow into deep layers to guide the generation of high-level feature attention maps. Specifically, the GCA module is composed of two key components, where the global feature aggregation module achieves mutual reinforcement of salient feature embeddings from any two spatial locations, and the cascaded pyramid attention module tackles the scale variation issue by building up a cascaded pyramid framework to progressively refine the attention map in a coarse-to-fine manner. In addition, we construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations, which is currently the largest publicly available benchmark. Extensive experiments demonstrate that our proposed DAFNet significantly outperforms the existing state-of-the-art SOD competitors. https://github.com/rmcong/DAFNet_TIP20.

15.
Artigo em Inglês | MEDLINE | ID: mdl-32746245

RESUMO

The ability to synthesize multi-modality data is highly desirable for many computer-aided medical applications, e.g. clinical diagnosis and neuroscience research, since rich imaging cohorts offer diverse and complementary information unraveling human tissues. However, collecting acquisitions can be limited by adversary factors such as patient discomfort, expensive cost and scanner unavailability. In this paper, we propose a multi-task coherent modality transferable GAN (MCMT-GAN) to address this issue for brain MRI synthesis in an unsupervised manner. Through combining the bidirectional adversarial loss, cycle-consistency loss, domain adapted loss and manifold regularization in a volumetric space, MCMT-GAN is robust for multi-modality brain image synthesis with visually high fidelity. In addition, we complement discriminators collaboratively working with segmentors which ensure the usefulness of our results to segmentation task. Experiments evaluated on various cross-modality synthesis show that our method produces visually impressive results with substitutability for clinical post-processing and also exceeds the state-of-the-art methods.

16.
IEEE Trans Cybern ; 50(8): 3627-3639, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31443060

RESUMO

Depth information has been demonstrated to be useful for saliency detection. However, the existing methods for RGBD saliency detection mainly focus on designing straightforward and comprehensive models, while ignoring the transferable ability of the existing RGB saliency detection models. In this article, we propose a novel depth-guided transformation model (DTM) going from RGB saliency to RGBD saliency. The proposed model includes three components, that is: 1) multilevel RGBD saliency initialization; 2) depth-guided saliency refinement; and 3) saliency optimization with depth constraints. The explicit depth feature is first utilized in the multilevel RGBD saliency model to initialize the RGBD saliency by combining the global compactness saliency cue and local geodesic saliency cue. The depth-guided saliency refinement is used to further highlight the salient objects and suppress the background regions by introducing the prior depth domain knowledge and prior refined depth shape. Benefiting from the consistency of the entire object in the depth map, we formulate an optimization model to attain more consistent and accurate saliency results via an energy function, which integrates the unary data term, color smooth term, and depth consistency term. Experiments on three public RGBD saliency detection benchmarks demonstrate the effectiveness and performance improvement of the proposed DTM from RGB to RGBD saliency.

17.
Artigo em Inglês | MEDLINE | ID: mdl-31796402

RESUMO

Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images acquired in the wild and how we could gauge the progress in the field. To bridge this gap, we present the first comprehensive perceptual study and analysis of underwater image enhancement using large-scale real-world images. In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images. We treat the rest 60 underwater images which cannot obtain satisfactory reference images as challenging data. Using this dataset, we conduct a comprehensive study of the state-of-the-art underwater image enhancement algorithms qualitatively and quantitatively. In addition, we propose an underwater image enhancement network (called Water-Net) trained on this benchmark as a baseline, which indicates the generalization of the proposed UIEB for training Convolutional Neural Networks (CNNs). The benchmark evaluations and the proposed Water-Net demonstrate the performance and limitations of state-of-the-art algorithms, which shed light on future research in underwater image enhancement. The dataset and code are available at.

18.
IEEE Trans Image Process ; 28(10): 4819-4831, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31059438

RESUMO

Video saliency detection aims to continuously discover the motion-related salient objects from the video sequences. Since it needs to consider the spatial and temporal constraints jointly, video saliency detection is more challenging than image saliency detection. In this paper, we propose a new method to detect the salient objects in video based on sparse reconstruction and propagation. With the assistance of novel static and motion priors, a single-frame saliency model is first designed to represent the spatial saliency in each individual frame via the sparsity-based reconstruction. Then, through a progressive sparsity-based propagation, the sequential correspondence in the temporal space is captured to produce the inter-frame saliency map. Finally, these two maps are incorporated into a global optimization model to achieve spatio-temporal smoothness and global consistency of the salient object in the whole video. The experiments on three large-scale video saliency datasets demonstrate that the proposed method outperforms the state-of-the-art algorithms both qualitatively and quantitatively.

19.
IEEE Trans Cybern ; 49(1): 233-246, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29990261

RESUMO

As a newly emerging and significant topic in computer vision community, co-saliency detection aims at discovering the common salient objects in multiple related images. The existing methods often generate the co-saliency map through a direct forward pipeline which is based on the designed cues or initialization, but lack the refinement-cycle scheme. Moreover, they mainly focus on RGB image and ignore the depth information for RGBD images. In this paper, we propose an iterative RGBD co-saliency framework, which utilizes the existing single saliency maps as the initialization, and generates the final RGBD co-saliency map by using a refinement-cycle model. Three schemes are employed in the proposed RGBD co-saliency framework, which include the addition scheme, deletion scheme, and iteration scheme. The addition scheme is used to highlight the salient regions based on intra-image depth propagation and saliency propagation, while the deletion scheme filters the saliency regions and removes the non-common salient regions based on interimage constraint. The iteration scheme is proposed to obtain more homogeneous and consistent co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is proposed in the addition scheme to introduce the depth information to enhance identification of co-salient objects. The proposed method can effectively exploit any existing 2-D saliency model to work well in RGBD co-saliency scenarios. The experiments on two RGBD co-saliency datasets demonstrate the effectiveness of our proposed framework.

20.
Artigo em Inglês | MEDLINE | ID: mdl-30571635

RESUMO

Rapid development of affordable and portable consumer depth cameras facilitates the use of depth information in many computer vision tasks such as intelligent vehicles and 3D reconstruction. However, depth map captured by low-cost depth sensors (e.g., Kinect) usually suffers from low spatial resolution, which limits its potential applications. In this paper, we propose a novel deep network for depth map super-resolution (SR), called DepthSR-Net. The proposed DepthSR-Net automatically infers a high resolution (HR) depth map from its low resolution (LR) version by hierarchical features driven residual learning. Specifically, DepthSR-Net is built on a residual U-Net deep network architecture. Given LR depth map, we first obtain the desired HR by bicubic interpolation upsampling, and then construct an input pyramid to achieve multiple level receptive fields. Next, we extract hierarchical features from the input pyramid, intensity image, and encoder-decoder structure of UNet. Finally, we learn the residual between the interpolated depth map and the corresponding HR one using the rich hierarchical features. The final HR depth map is achieved by adding the learned residual to the interpolated depth map. We conduct an ablation study to demonstrate the effectiveness of each component in the proposed network. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods. Additionally, the potential usage of the proposed network in other low-level vision problems is discussed.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...